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L FIELD OF INVENTION 

The present invention relates to the field of personalization systems. More 
particularly, the present invention relates to a method and system for generating client 
preference recommendations in a high performance computing regime. 

II. BACKGROUND OF THE INVENTION 

In a conventional transaction in which a client selects a good or service, the client 
generally has a set of preferences associated with a similar or dissimilar set of goods or 
services. In a mathematical sense, one can make a one-to-one correspondence between 
the set of preferences and the set of goods or services. For example, given the ordered set 
of goods or services: 

{book "AV\film "A7\ restaurant "A3", . . . , music "A2Q0"} 
where "A1-A200" denote, for example specific products or services, and given client 
"U\ the ordered preferences may be expressed as for example Boolean quantities: 
LPs preferences: 

{book "Al": yes, film "A2": not yes, restaurant "A3": yes, . . . , music "A200": yes} 
This may be expressed in the shorthand form, using "1" to denote "yes", "0" to denote 
"not yes", and "£/" to denote "U's preferences" as a row vector containing 200 entries: 
17= {1, 0,1,.. .,1} 

Suppose that there is an additional item of interest: book "A201" of which ITs 
Boolean preference is unknown: "0". Denoting book "A201" in the ordered set as one 
increment to the right, one may express this as: 
U= {1,0,1, ...,1,0} 

Suppose, further, that the following preferences for other clients, Z1-Z1000 are 
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known and encompass book "A201": 
ZZ = {0,1,0,...,1,1} 
Z2 = {1,1,1, ...,0,1} 
Z3={1, 1,0,..., 1,0} 

Z1000^{0, 0,1,. ..,1,1} 

Personalization systems are generally designed in order to provide a robust 
recommendation for client U regarding item A2 01 9 based upon known preferences of 
client U as well as known preferences of other clients Z1-Z1000. The use of 

10 conventional personalization systems or recommendation systems in E-commerce is 
described in "Recommender Systems in E-Commerce," Proceedings of the ACM 
Conference on Electronic Commerce, Nov. 3-5, 1999, by Schafer et ah 

One skilled in the art will appreciate that the utility of a recommendation system 
is driven by the method used for determining the amount of correlation that exists 

15 between the votes for two or more items, or by the amount of correlation that exists 

between the votes of two or more clients. There are a number of ways of determining 
correlation, for example, as discussed in "An Algorithmic Framework for Performing 
Collaborative Filtering," Proceedings of the 1999 Conference on Research and 
Development in Information Retrieval, Aug. 1999, by Herlocker et al Such methods 

20 include, for example the computation of Pearson correlations, as used in the 

GROUPLENS system, the calculation of Spearman rank correlation coefficients, or a 
least-squares comparison. 

A basic problem with conventional recommendation systems, however, is directly 
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related to the issue of combinatorial explosion. The volume of data collected from clients 
engaged in E-commerce is outpacing the conventionally applied computational ability to 
rapidly process such preferences and generate accurate recommendations. Although the 
introductory examples articulated in this document represent relatively trivial matrices 
5 (i.e. 201 x 201 matrices, or 1,000 x 1,000 matrices), in actual practice one must be able to 

work with matrices of the order of 1,000,000 x 1,000,000 and higher. In light of the 
foregoing, it remains desirable to introduce a system and method that can accurately 
process large ratings-matrices in a rapid fashion so as to generate accurate 
recommendations. 

10 Another concern with conventional systems is related to the desire to preserve 

client data privacy. With such a large amount of data being processed for a given client, 
it is desirable for a system and method that will not allow one to reconstruct the original 
data set from the disclosed portion of the recommendation model. 



15 IE. SUMMARY OF THE INVENTION 

Accordingly, in a first embodiment of the present invention, a method of 
providing a recommendation to a user comprises: providing a sparse ratings matrix, 
forming a plurality of data structures representing the sparse ratings matrix, forming a 
runtime recommendation model from the plurality of data structures, determining a 
20 recommendation from the runtime recommendation model in response to a request from a 
user, and providing the recommendation to the user. 

In a second embodiment of the present invention, a method of providing a 
recommendation to a user comprises: providing a sparse ratings matrix, providing an 
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update ratings data structure, forming a plurality of data structures representing the sparse 
ratings matrix, forming a runtime recommendation model from the plurality of data 
structures and the update ratings data structure, determining a recommendation from the 
runtime recommendation model in response to a request from a user, and providing the 
5 recommendation to the user. 

In a third embodiment of the present invention, a method of providing a 
recommendation to a user comprises: providing a sparse ratings matrix, forming a 
plurality of data structures representing the sparse ratings matrix, forming a first 
recommendation model from said plurality of data structures, perturbing the first 

1 0 recommendation model to generate a runtime recommendation model, determining a 

recommendation from the runtime recommendation model in response to a request from a 
user, providing the recommendation to the user. 

In a fourth embodiment of the present invention, a method of providing a 
recommendation to a user comprises: providing a sparse ratings matrix, forming a 

15 plurality of data structures representing the sparse ratings matrix, forming a first 

recommendation model from the plurality of data structures, truncating the first 
recommendation model to generate a runtime recommendation model, determining a 
recommendation from the runtime recommendation model in response to a request from a 
user, and providing the recommendation to the user. 

20 In a fifth embodiment of the present invention, a method of providing a 

recommendation to a user comprises: providing a first ratings matrix, providing a second 
ratings matrix, forming a runtime recommendation model from the cross-set 
co-occurrences of the first ratings matrix and the second ratings matrix, determining a 
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recommendation from the runtime recommendation model in response to a request from a 

user, and providing the recommendation to the user. 

Further still, in a sixth embodiment of the present invention, a method of 

providing a recommendation to a user comprises determining a recommendation from a 
5 recommendation model using a multiplicity voting scheme, which may be personalized 

or may be anonymous. 

Additional features and advantages of the invention will be set forth in the 

description that follows, and in part will be apparent from the description, or may be 

learned by practice of the invention. The objectives and other advantages of the 
10 invention will be realized and attained by the process and apparatus particularly pointed 

out in the written description and claims herein as well as the appended drawings. 



IV. BRIEF DESCRIPTION OF DRAWINGS 

The accompanying drawings, which are incorporated in and constitute a part of 
1 5 this specification, illustrate an implementation of the invention and, together with the 

description, serve to explain the advantages and principles of the invention. In the 
drawings, 

FIG. 1 depicts a recommendation scheme of the prior art in which a base model is 
not modified before generating a preference recommendation through an on-line or 
20 runtime model; 

FIG. 2 depicts a recommendation scheme consistent with the present invention in 

which additional data allows the construction of a perturbed on-line model and generates 

a modified runtime recommendation model; 

FIG. 3 schematically indicates exemplary relationships between various processes 
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of the present invention and various helper functions in preferred embodiments; 

FIG. 4 depicts a system configuration consistent with the present invention in 
which a runtime recommendation system cooperates with an off-line recommendation 
system; 

5 FIG. 5 is a schematic depiction of a method consistent with a first embodiment of 

the present invention; 

FIG. 6 depicts a conventional a node structure of the prior art in a distributed 
computing process; 

FIG. 7 is a schematic depiction of a method consistent with a second embodiment 
10 of the present invention; 

FIG. 8 is a schematic depiction of alternative methods consistent with a third or 
fourth embodiment of the present invention; 

FIG. 9 is a schematic depiction of a method consistent with a fifth embodiment of 
the present invention; and 
1 5 FIG. 1 0 depicts an example from the prior art of a transformation to compressed 

row format. 



V. DETAILED DESCRIPTION 

Reference will now be made in detail to an implementation consistent with the 
20 present invention as illustrated in the accompanying drawings. Whenever possible, the 
same reference number will be used throughout the drawings and the following 
description to refer to the same or like parts. 
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V.A. Term Definitions 

As used herein, the symbol 9i indicates a set. An array of the form 

r 

r l,l r l,2 r l,3 *"* r l,m 



jR = 



r 2,l r 2,2 r 2,3 '^\m 



V 1 n, 2 n, 3 n, nv 
with entries with values r e 91 is called an w x m matrix over 91, with n rows and iw 

y 

columns. One skilled in the art should appreciate that the matrices in question are such 
that n and m are finite but not bounded. For example, new rows and columns are often 
added. This implies that calculations are preferably performed over infinite matrices with 
the property that all entries are zero except for those entries in a finite number of rows 
and columns. 

As used herein, a row from a matrix, for example R, is a 1 x m matrix. If i 
indicates the row in question, then 

(r. 1 r. _ r. . . . r. ) 
v i,\ i,2 i,3 i,rn 

is the ith row of J?. Similarly, as used herein, a column from a matrix is an m x 1 matrix. 
If/ indicates the column in question, then 
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10 



r \ 



is the jth column of R. 

As used herein, a vector isalxmorawxl matrix. The above statement 

regarding unbounded matrices applies to vectors as well Accordingly, one skilled in the 

art should appreciate that definitions of operations are depicted over finite ranges for 

convenience only. 

As used herein, rating indicates an entry of a "rating matrix " defined below. The 
presence of an entry is an indication that a relationship exists between a given client and a 
given item. 

As used herein, a ratings matrix is a collection of numerical values indicating a 
relationship between a plurality of clients and a plurality of items. In general, and as 
indicated earlier, one may denote this as: 



R=R 



u,i 



1 : if client u votes favorably for item ; 
0 : otherwise 



where u € U, the set of all clients, and / e /, the set of all items. One skilled in the art 
1 5 should appreciate that "votes favorably" as used above may correspond to a variety of 

acts. For example, a favorable vote may correspond to client u purchasing item i, or it 
may correspond to client u literally expressing a favorable interest in item L Again, item 
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i itself is not limited to goods but may also correspond to services. 

As used herein, the notation A . will denote the zth row of matrix AmdA 

will denote the jfh column of A Further still, it is useful to speak of vectors as if they 
were sets and vice-versa. One skilled in the art should be able to discern which is being 
referred to by the context of the operations performed. If one considers the set 

given any two items / and j e I, then the set of clients having voted favorably for both 
items is given by: 

R .fltf ={ug U\ R .= l}fl{«6 U\ R .= 1} 
* 9 i * J u 9 i uj 



= {ueU\R u .= l,R ur l} 

= {ueU \ R R -=1} 
1 u,i uj ' 

Furthermore, if the cardinality of the set is taken, then the following is derived: 



#{r .hr .]=y\ r .= r ■= 



R*R\ 



u eU u e U 



The above relationship indicates that the dot product of the two columns from the 
ratings matrix is a sum over the number of co-rates between two items. Performing this 
for all possible pairs yields an item-item matrix of co-rates. 

As used herein, an item-item model may be constructed by computing the matrix 
(j.i)M = R*R where the superscript "P indicates a transposed matrix, and the pre- 
subscript "(I-I)" on M indicates an item-item model. The item-item model indicates the 
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correlation between two items for which preference ratings are known. The diagonal 
portion of (} -qM, for example the entry at row i and column z, corresponds to the total 
number of votes for item i. Furthermore, the number of clients having co-rated any 
item-item pair is given by the respective entries from the matrix R *R. 

Further still, given any two clients, the number of co-rated items between them is 
given by the respective entry of RR \ Both symmetric forms are of interest to the types of 
problems that will be discussed herein. 

Accordingly, and as used herein, a client-client model may be constructed by 
computing (CrC )M ^RR f , where the pre-subscript "(c-c)" indicates "client-client" As 
before, the diagonal entries of the above matrix indicate how many favorable votes the 
corresponding client made. 

As used herein, the item-item model ( m)M and the client-client model (c - c )M will be 
denoted in general as M. 

One skilled in the art should appreciate that, given a ratings matrix J?, then R *R as 
well as R R 1 are symmetric as previously noted. For any given row of R l R or RR \ the 
diagonal entry is the largest entry in the row. This is made apparent by considering that 
for each z, one has for all j that 

i? (Iff cl? . 

*,z * J - *,* 

In addition, given any row i, the value of the diagonal term is the number of non- 
zero entries in the zth column of R. Therefore, given any column index, z, of R the zth 
row (or column) of R f R or R R 1 induces a relative scaling on all column indices of J?. 
One may order the column indices according to this scaling, if it is decided how to order 
between indices that have the same relative ranking. One suitable manner is to decide 

11 



Attorney Docket No. 7744-0061 

uniformly between equivalently ranked indices of a row of J? *R or R R *. 

As used herein, unary data indicates a ratings data in which there are only two 
types of information: positive and no information. Such data sources are usually encoded 
with rating values of either zero or one. It is customary to let zero express no information 
since such use produces a sparse data set. 

As used herein, interest data indicates ratings data in which there is a scaling to 
the positive interest of a rating. One skilled in the art should appreciate that the range of 
values is bounded such that each value is finite. 

As used herein, likert data indicates ratings data in which there is a scaling that 
includes both positive interest and a degree of possible dislike. 

As used herein, co-rate indicates either a co-rate of clients, or a co-rate of items. 
These two senses are analogous to each other. One skilled in the art should appreciate 
that two items are said to co-rate each other if and only if there exists a client that has 
rated both of these items. Therefore, it is permissible to have an item co-rate itself. 
Further still, two clients are said to co-rate each other if and only if there exists an item 
that both clients have rated. 

V.B. Functional Definitions 

The general process of generating a recommendation from a recommendation 
model consistent with embodiments of the present invention is discussed in this section. 
As used herein, the function Index(*) operating on a row of a matrix (a vector) sets the 
rows' co-rate with itself to zero. For example, given the row: 

M /=12,* = (0,0,1,0,4,2,0,0,0,3,0,6,1,0,0,0} 

12 
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yields: 

Index(M. =12 ^ ) = {0,0,1 A4A0,0,0,3A0,1,0,0,0 } 

As used herein, the top-it co-rate for a row is denoted by Index*. For example, if 

k = 3 

Index^ 3 (M. =12 # ) = {0,0,0,0,4,2,0,0,0,3,0,0,0,0,0,0} 

Notice that if one is interested in the top four co-rates, then a problem of breaking 
up ties would arise. This is a problem of a local degeneracy within a row. One may 
break this local degeneracy in a number of ways. For example, the global popularity of 
the items in question yield several approaches, two approaches of which are to select the 
most globally popular or the least globally popular. One skilled in the art should 
understand that when a top-* row vector is discussed, a method for breaking ties (or 
breaking such degeneracies) is implied. 

As used herein, the operator that returns the top-* values is denoted by TOP*. For 
k=3 9 one has 

TOY ^ M i=\2 * ) = {°A0,0,4,0,0,0,0,3,0,6,0,0,0,0} 

In general, the collection of methods covered by the M-model approach maps a 
row of M to a vector using a function of Index^Af . J and some statistics of M. This is 

usually done to scale the ranking induced by the co-rate matrix. This will be discussed in 
more detail later, but the most basic of operations is to set all the non-zero co-rates to 1. 
As used herein, this operation is denoted by Unary and in this example 

Unary(Index^ 3 ( M^ u )) = {0,0,0,0,1,1,0,0,0,1,0,0,0,0,0,0} 

13 
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Because of its common use in the on-line recommendation, the above operation 
will be denoted herein in shortened form herein by Unary^*). If any of these operators 

act on a matrix, it is defined herein to return a matrix in which the operator acts on each 
row of the input matrix. 

As used herein, the diagonal operator, D, is an overloaded operator in the 
following sense: if D operates on square matrix then the return is a vector whose terms 
are from the diagonal of the respective matrix; alternatively if D operates on a vector then 
it returns a diagonal matrix whose non-zero diagonal entries correspond to the respective 

vector. For example, given a vector if, D is defined herein by 



(-?*.?)..: iff =/ 

0 : otherwise 



where T*= {1,1,1, ... } is a row vector of all l's. Furthermore, given a square matrix^, 
then 

D(A)=D(A) hr A i . 

As used herein, a multiplicity voting recommendation scheme returning a 

maximum ofk' elements from S and ^-neighbors is given by 

r \ 

UnaryJMuMVote^, (S, k) = TOP^., I' Unary^M. J 

yieS 

where the primed summation indicates a summation of the unique entries of S and where 

S={X,JC~,... t X } e XP. As defined herein, the variable X m may represent an item 
1 1 2 P 

14 
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when using the item-item model (i-i)M it *, or it may represent a client when using the 
client-client model ^jM*,*. 

As used herein, a non-unary version of this scheme may be expressed as 

f 

MultLVote^, (S, k) = TOP^, 



Z'lndex^M.^) 



KieS 

Suppose that (M )M = R *R is a base model from which one constructs an on-line 
model, /r n M r ( k ) = Unary A n T] M). The derived model is computed so that 

(1-1) K (1-1) 

Unary JVlultiJVote^, (* 5 k) may be computed more efficiently as 



Unary_Multi_Vote^, (5, k) = TOP^, 



KXeS 



where r(k) indicates the runtime model's dependence on the parameter L This equation 
1 0 represents the unperturbed recommendation system using anonymous recommendations. 

For personalized recommendation one may use 

Unary_MuIti_Vote^, (R u # , k) where ueU 

Suppose that there are multiple sources of ratings data. These different data sets 
may represent transactions other than purchases. For example, one set might be client 
1 5 purchases and another might be demographic data for these clients. As another example, 

the data may represent different divisions from a given company, different companies 
data, purchases in specified categories, etc. Therefore, suppose that one has a sequence of 
ratings matrices that represent different dimensions 

20 and that these matrices represent data for a common set of clients. Specifically, suppose 

15 
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that the zth row of each matrix represents the same client. As used herein, a ratings 
matrix of augmented matrices comprises 

5 Computing the matrix of co-rates between different dimensions as % t R gives a block 
matrix whose blocks are given by 

Therefore, if one wants to know the top-& co-rated members from dimension j for 
dimension z, one determines 

10 Unary^CO)^) 

The case i = j is the case where there is one source of ratings data R described above. In 
this case 

Unary k (RtR) = Mr( k ) 

As used herein, the runtime model of co-rates between dimensions i and j is 

15 M(i f j) r (k) = Unary^((i?(0) tgjj)) 

In this situation, if a recommendation is from dimension z, then this recommendation, as 
used herein, is referred to as an z-recommendation. Furthermore, if dimensions i and j are 
both being considered for recommendations then it is referred to as 
{ij} -recommendation, etc. Furthermore, z-ratings, as used herein, refer to the use of 
20 ratings data from dimension i as input. As an example, suppose one wants to make 
y-recommendations from z-ratings for client u. Extending the previously defined 
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approach, this is given by 

Unary^MultLVoteCy)^, {R % m ,k) = TOP^, ( 2Wj7^ z? *))) 

VI. OVERVIEW OF THE PRESENT INVENTION 

5 Data may be represented in a variety of forms and may correspond to a variety of 

items of interest. One of the objects of a recommendation model, however, is to draw out 
correlations in the data between items to aid profitability. The present invention, in a 
general sense, uses data to build a recommendation model that in order to provide 
personalized recommendations. In particular, the present invention, in one embodiment, 

1 0 involves constructing multiple recommendation models and from the collection of 

models, solving a particular client's problem. For example, recommendation models may 
be characterized as on-line (or runtime) and/or off-line. Further, the on-line 
recommendation model may be constructed from an off-line model. In certain instances, 
the off-line model may be better suited for batch processing of recommendations, 

15 because performance is less of an issue than that of the on-line equivalent. The usage 
scenario described below is an example of this situation. 

Furthermore, even the off-line model may be only a portion of a larger model. 
One of the aspects of this approach is that a taxonomy of recommendations may be 
constructed using various models produced from data sources in an off-line manner. To 

20 be more precise, the various models are produced in a manner independent of their use in 
making personalized recommendations. In this manner, the runtime models are 
constructed as though in a memory cache in which data has been ordered and as much 
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pre-computation as possible has occurred in anticipation of the final on-line calculations 
required at runtime. 

In summary, it is beneficial to have a methodology that allows one to derive 
models from previously constructed models, and from which there can be incremental 
5 updates, thereby providing current and accurate knowledge of the data represented. These 
models, in turn, may be used in either an off-line or on-line fashion to construct 
recommendations. One skilled in the art should appreciate that one of the benefits of the 
present invention is the development of a consistent interpretation of correlated data as it 
relates to doing business across multiple interactions with a client. Additional data 

10 derived from such interactions may then be fed back into existing data, thereby allowing 
the process of model creation to incrementally update the collection of models to from 
the most up-to-date and accurate knowledge for both off-line and on-line (runtime) 
processing. The accuracy of such a collection of models is a measure of at least two 
aspirations: (i) firstly, the ability to correctly represent all data sources contributing to the 

15 models; and (ii) secondly, the ability to correctly represent the current (or runtime) 
intentions of the recommender (for example, the marketer or the reseller). 

VLA. Usage scenario 

For exemplary purposes only, suppose that a widget reseller has an Internet site or 
20 a call center at which clients may buy a plurality of widgets. Furthermore, the following 
information is considered known: (1) historical order data; (2) categorization of the 
widgets; and (3) profit margins for each widget 

Suppose that the widget reseller has a recommendation model, which provides 
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client recommendations with respect to the plurality of widgets offered for sale. At some 
point in time, the widget reseller determines that a first widget is overstocked. Thus, the 
widget reseller needs to sell the first widget. Accordingly, and based on the updated 
information regarding the first widget, the widget reseller would like to; (i) determine 
5 likely previous clients to target, using, for example, direct mailing, e-mail, or the 

telephone; and (ii) if a client comes to the Internet site of the widget reseller and the 
reseller determines that the client is a strong candidate to buy a first widget, then widget 
reseller would like to detect this event and recommend a first widget to the client. 

Accordingly, various embodiments of the present invention perform these two 

10 actions from a common framework while: (i) using the existing recommendation model 
for making recommendations as a base and deriving a modified model that represents the 
need to sell first widget; and (ii) changing the current working model for an on-line 
recommendation to reflect this need. That is to say, if the widget reseller would typically 
recommend a second widget under the base model, with the caveat that the second widget 

15 is part of a recommendation category in the base model that also contains the first widget, 
then the widget reseller may prefer to recommend the first widget in the modified 
recommendation model in place of the recommendation for second widget under the base 
model. 

Furthermore, suppose that the profit margin for the first widget is median among 
20 the range of profit margins for other widgets that may be likewise recommended under a 
base model. In such an instance, and under a modified recommendation model, the 
widget reseller may want to replace a recommendation for all those widgets with a profit 
margin less than first widget with a recommendation for the first widget. Further still, the 
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widget reseller may want to recommend the first widget in place of recommendations for 
widgets whose profit margin is higher than that of first widget, under a modified 
recommendation model One skilled in the art should appreciate that this may be 
desirable in the case where one does not want to ignore high profit margins at the expense 
of removing an over stocked, lower profit margin widget. 

FIG. 1 depicts a recommendation scheme of the prior art where a 
recommendation model is not modified to reflect additional data, such as where a given 
item, say a first widget, should preferably be recommended more often or less often. 

FIG. 2 depicts a modified situation consistent with the present invention where 
the dashed arrow schematically indicates the influence of additional data 60 as a result of 
introducing perturbed on-line model 210. Such a modification of base model 1 1 0, for 
example, uses additional data 60 to generate a marketing campaign for an item, say first 
widget, and perturbed on-line model 210 derived from the base model 1 10. As discussed 
in more detail below, one skilled in the art should appreciate that the perturbation may be 
undone at a time when it is decided to revert back to the base model. For example, at 
some time in the future (after the overstocked first widgets have been sold) the base 
model may be accepted as accurately reflecting ordinary, unperturbed buying patterns. 

This above scenario is exemplary only, and is intended to illustrate one use of the 
described invention. Additionally, in the discussion that follows, one skilled in the art 
should appreciate that the examples disclosed herein are expressed as dense matrices 
and/or dense vectors for purposes of readability. However, the methods and systems 
disclosed herein relate generally to sparse matrices and/or sparse vectors. 

FIG. 3 indicates the relationship between the various processes consistent with the 
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present invention. In particular, FIG. 3 is an exemplary schematic of the flow of data in 
an embodiment of the present invention, comprising the processes of: (1) preprocessing 
data 55; (2) scheduling 40; (3) loading a model 405; (4) adding rating(s) 65; (5) 
initializing 150; (6) updating 140; (7) perturbing model 35; (8) making a personalized 

5 recommendation 310; and (9) making an anonymous recommendation 320. FIG. 3 
further illustrates the partitioning of off-line processing region 105 and runtime 
processing region 400. 

Each of the above processes is described in more detail below. Items (1) through 
(4) above, in a preferred embodiment of the present invention, pertain to processing that 

10 assists the overall function of model creation. Items (5) through (9), on the other hand, 
constitute portions of embodiments of the present invention. 



VLB. Summary of helper processes 

As described above, the helper processes comprise the steps of: preprocessing 
15 data; scheduling; loading a model; and adding ratings. 



VLB. 1. Preprocessing data 

Client data may exist in a variety of possible formats, any one of which may not 
be directly usable by the system. Furthermore, there may be multiple data sources that 
20 collectively embody the "ratings data" or the "sparse ratings data." This data must be 

converted to a format that is suitable for the system, as indicated by preprocessing box 55 
in FIG. 3, so that further processing steps may use this data in its compressed sparse 
representation. Section XII.C below discusses the general forms data may take in more 
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detail. 

VI.B.2. Scheduling 

In a preferred embodiment of the present invention, scheduler 40, as indicated 
schematically in FIG. 3, is a functional unit of the present invention that enables 
processing to be initiated after which the instantiation of the system occurs. 

VI.B.3. Loading a model and adding rating(s) 

Because of a separation between on-line model processing (or runtime 
processing) and off-line model processing in various embodiments of the present 
invention, one skilled in the art should appreciate the step of loading the runtime model 
used in on-line processing. This is indicated schematically by runtime model loader 405 
in FIG. 3. Furthermore, the ability to add additional ratings to the models described by 
the present invention is incorporated in its design and is indicated schematically as add 
ratings box 65 in FIG. 3. In one preferred embodiment of the present invention, the 
additional ratings may originate as additional data from the on-line or runtime processing 
region 400. Furthermore, the arrow connecting ratings matrix data 170 and personal 
recommendation 310 indicates that, in certain instances, personal recommendations may 
be directly implemented from ratings matrix data 170. 

FIG. 4 depicts a system configuration consistent with the present invention in 
which runtime recommendation system 550 cooperates with off-line recommendation 
system 520 over network 510. Both runtime recommendation system 550 and off-line 
recommendation system 520 include processors as well as memory. In particular, off- 
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line recommendation system 520 includes memory 540 for the storage of sparse matrix 
545 information, and memory 530 for the storage of rules 535 for off-line processing. 
Likewise, runtime recommendation system 550 includes memory 560 for the storage of 
runtime model 560 and memory 570 for the storage of rules 575 for runtime processing. 
In a preferred embodiment runtime recommendation system 550 may form a portion of a 
data processing device with conventionally limited memory capabilities such as a 
personal digital assistant (PDA) or a mobile phone. In practice, however, one skilled in 
the art will appreciate that runtime recommendation system 550 may form a part of any 
data processing device such as a personal computer, workstation, or mainframe. 
Furthermore, the depiction of network 510 between runtime recommendation system 550 
and off-line recommendation system 520 is exemplary only. That is, one skilled in the 
art should appreciate that runtime recommendation system 550 and off-line 
recommendation system 520 may form different processing and memory portions of the 
same data processing device. 

VII. FIRST EMBODIMENT OF THE PRESENT INVENTION 

In a first embodiment of the present invention, depicted schematically in FIG. 5, a 
method of providing a recommendation to a user comprises: providing a sparse ratings 
matrix, forming a plurality of data structures representing the sparse ratings matrix, 
forming a runtime recommendation model from the plurality of data structures, 
determining a recommendation from the runtime recommendation model in response to a 
request from a user, and providing the recommendation to the user. 

In one example, discussed below, the plurality of data structures corresponds to 

the partitioning of a ratings matrix into a plurality of sub-space matrices, where one of the 
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plurality of sub-space matrix is manipulated either singly or with a second sub-space 
matrix to produce a recommendation. For example, the plurality of sub-space ratings 
matrix may correspond to a plurality of categories. In FIG. 3, this is schematically 
depicted as initialize box 150, and in FIG. 5, this corresponds to step 610. 

5 In general, step 605 of FIG. 5 includes retrieving or otherwise receiving data 

corresponding to a sparse ratings matrix. As mentioned above, step 610 initializes the 
sparse ratings matrix for further processing by forming a plurality of data structures 
representing the sparse ratings matrix. 

In initializing sparse matrices of the present invention (step 610), the off-line 

1 0 model creation as described herein is based on computing the products R 1 R and RR \ 
Below, it is shown that the task of model creation may be decomposed into constituent 
building blocks that are computed from matrix products. 

VILA. Categorical Data 

15 As used herein, categories are mappings between dimensions. As previously 

discussed, the process of making recommendations for clients using one dimension's 
ratings for another dimension's recommendations concerned a situation in which there 
are ratings for both dimensions. Suppose that this is not the case, and that one only has 
ratings data for dimension i, but that one has a mapping between dimension i andj. For 

20 example, dimension j may represent categories for items contained in dimension z. Let 
this mapping, denoted by T, be given by 




1 : if item / is contained in category c 
0 : otherwise 
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If matrix R is multiplied on the right by T, the resulting matrix may be considered 
a rating matrix of clients to categories. It is interesting to note that the scale should now 
be considered an interest scale as discussed in more detail below in Section XII.C.l, 
because higher valued entries denote that the respective client rated more items in this 
5 category than in category entries with lower values. 

However, the discussion in this Section is concerned with the issue of mapping 
one's ability to make recommendations in one dimension to making recommendations in 
another in which there is no ratings data. For this, the model is defined by 

M = Rt* R* T 

(I-I) 

1 0 and one possible recommendation model is given by (step 615) 

Unaiy_Multi_VoteO-,c)£, (R^k) = TOP^, ( ^ ^Mii^yiK )^)) 

z<=R 

u, * 

In this instance, T induces a ratings matrix for dimension c. This result suggests 
another approach for making recommendations in this case (step 630) by letting 

p_jff= Rt ' Unary(i? • T) 

15 in step 615. 

VTI.B. Distributed Modeling 

In this section, the following temporal concepts are discussed: (1) Model 
Creation (MC): the point in time in which the model is created; (2) Runtime Model 
20 Creation (BMC) : the point in time in which the runtime model is created; (3) Request 
Recommendation (RR): the point in time in which the recommendation is requested; 
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and (4) Recommendation Process (RP) the point in time in which the recommendation 
is processed. From the above itemization of concepts, the "Recommendation Process" is 
essentially the final modification of the recommendation model (step 630). In a preferred 
embodiment of the present invention the temporal ordering among these four events is 
5 such that each event preferably occurs before the next may be considered for processing. 
However, such ordering is not a requirement of the present invention. 

Further still, Personalization Identifiers (PI) are the key indicators that allow 
personalization to begin. Before the personalization identifiers are known, it is preferable 
to have as much model creation as is possible occur. After the Pis exist and a 

10 recommendation request occurs (at runtime), the final recommendation process may 
begin. In addition, Personalization Identification (PIndent) is the time at which the 
personalization identifiers have become known. 

One aspect of these definitions is that in scenarios in which personalization 
identification occurs before the request for personalization, there is an opportunity for 

1 5 efficiently pre-calculating derived models that are customized to the fact that 

personalization identification has occurred. This provides an opportunity for extremely 
detailed personalization. In addition, the following list indicates areas in which 
distributed computed techniques may be involved: (i) parallel processing of model 
creation and various derived models; (ii) parallel processing of the recommendation 

20 process (i.e., the construction of the final recommendation); (iii) business-to-business 
model sharing; and (iv) efficient calculation of the entire model, together with 
distributing derived models to locations where the local recommendation are made. 
Specific implementations of distributed model creation and manipulation are discussed in 
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detail below. 



VILC. Distributing Model Creation 

As discussed earlier regarding ratings matrices based in different dimensions: 

formulas were presented that described the calculation required to compute both the 
complete co-rate model or the runtime co-rate models corresponding to either R 1 R or 
RR \ Since the ratings matrix R may be banded or striped, as discussed below, the 
calculation from that section yields a manner in which to distribute the calculation of 
these models, as, for example, in step 615 of FIG. 5. 



VILC. 1 . Banding by Rows 

Consider a partition of all clients for which there exists a item rating. Denote this 
partition as B = (B\ 9 B 2 , . . . , B$ and define the bands 



u 9 i 



R .iifueB. 
u ? i J 

0 : otherwise 



Since the bands are partitioning the rating by clients, it will be simple to derive a 
update formula directly. The result is a special case of a derivation provided below. 
Consider 

R=RV) + R(D+...+R(k) 

where R 0) is defined above. The model is given by the following calculation 
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R f R = ( I ]^(J^O)0(RC0) 
i = = i 

Collecting terms and reordering the summations gives 

= ( Z ^(jr(0)^(R(/)) 

i = i/=i 

1=1 1 y.'?<7<A 

5 where the internal summation is over terms that are all zero. Hence derivation reduces to 

#R= if (if(0/8(0 

z=l 

This formula implies that, banded by clients, the co-rate model if ' R may be 
distributed to multiple nodes of a computing cluster. Each node of such a cluster can 
compute its respective piece of the model. After any two nodes of such a cluster have 
1 0 computed their portion(s) of the model, communication to add the terms computed is 

permissible, etc. If there are 2 n bands of clients, the final model may be constructed with 
as few as n parallel steps, with a total of 2" - 1 summations. Powers of two have been 
chosen for convenience. This will now be clarified by the following recursion formula. 
Let the base of the recursion be defined by 

15 (i-i) M fty = (R&YR® where j:Kj<2 n 
the recursion is now given by 

(I _^Mj(k) =M 2j _ jflfc- 1) +M 2J (k- 1) where j : 1 <j < 2 n ~~ k and k : 1 < k < n 

When k = nj can only take the value 1, at which point the recursion terminates 
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and 

(I-I) M l^ =:StR 

The distribution of the calculation is made easier due to the cross-band terms 
reducing to zero. In the next section this will not be the case. 

5 

VII.C.2. Striping by Columns 

In this section, the ratings matrix will be striped by items instead of banded by 
clients. The motivation for this approach is to calculate extremely large data sets in which 
both R 1 R and RR 1 are to be calculated. Striping by items is preferred for the R 1 R case 

10 (and by client for the RR f case). In either case, striping as opposed to banding causes the 
cross-stripe terms be non-zero, requiring some discussion of distributing the calculation 
to combining intermediate results to calculate the final result. Many of the previous 
results still apply and will be the starting point for this discussion. 

Consider that there are N blocks enumerated from 0 to JV- 1 as B = (Bo, #2? ■ • 

15 . , Bn _ 1) with the following definition: 



R ,:]fieB. 
u,i J 

0 : otherwise 



Similar to before, one has 

R = R(l) + R(2)+...+R(N-l) 

By reorganizing the terms 
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RtR= Z 



(R(i))tRd)+ I iKO^O") 
j:i<j<N-l 



i=\ 



i=\j:i<j<N-\ 



= ^Z UoyjKO + z { j?(o^(/) 

f=l (ij):tei<ji,N-l 
Therefore, one finds that the previous recursion applies to the first summation, which 
leaves distributing the second summation. Reordering the second summation yields 



I {tf(0,tf(/) 

(ij):0<i<jiN-l 



N i l i {*(o,*(/) 

k=l 0<ij<N-l 



where 





j- 


-i = k 


N-l 


N- 


-k- If 


= z 






k=\ 


i 


= 0 


N-l 


N- 


-k-1 


= z 




Z ( 




i 


= 0 


R(i)jl<j) \: if i<j 







j?(0^rO*)| : if z >y 

0 : otherwise 



In this summation, only the case / < j is realized, but the notation will be useful in 
the next section. If the computing cluster has N nodes, and # (,) is stored on the ith node, 
then the above summation indicates which nodes communicate to complete the 
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computation of the model. 



VILC.3 . An Example of Distributed Model Creation Striped by Columns 

Consider a four-node computing cluster for which there are 16 stripes required to 
5 represent R. Tables Al and A2 indicate the distribution of work in the case of 16 nodes. 
In this scenario, half the of the nodes of the cluster are idle in eight stages. 



Table Al 



Stage 


NodeO 


node 1 


node 2 


node 3 


node 4 


node 5 


node 6 


node 7 


-1- 


(0,1) 


(1,2) 


(2,3) 


(3,4) 


(4,5) 


(5,6) 


(6,7) 


(7,8) 


-2- 


(0,2) 


(1,3) 


(2,4) 


(3,5) 


(4,6) 


(5,7) 


(6,8) 


(7,9) 


-3- 


(0,3) 


(1,4) 


(2,5) 


(3,6) 


(4,7) 


(5,8) 


(6,9) 


(7,10) 


-4- 


(0,4) 


(1,5) 


(2,6) 


(3,7) 


(4,8) 


(5,9) 


(6,10) 


(7,11) 


-5- 


(0,5) 


(1,6) 


(2,7) 


(3,8) 


(4,9) 


(5,10) 


(6,11) 


(7,12) 


-6- 


(0,6) 


(1,7) 


(2,8) 


(3,9) 


(4,10) 


(5,H) 


(6,12) 


(7,13) 


-7- 


(0,7) 


(1,8) 


(2,9) 


(3,10) 


(4,11) 


(5,12) 


(6,13) 


(7,14) 


-8- 


(0,8) 


(1,9) 


(2,10) 


(3,11) 


(4,12) 


(5,13) 


(6,14) 


(7,15) 



10 



15 
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Table A2 



Stage 


node 8 


node 9 


node 10 


node 11 


node 12 


node 13 


node 14 


node 15 


T 


IP) = 


(9,10) 


(10,11) 


(11,12) 


(12,13) 


(13,14) 


(14,15) 


(15,0) 


-2- 


(8,10) 


(9,11) 


(10,12) 


(11,13) 


(12,14) 


(13,15) 


(14,0) 


(15,1) 


-3- 


(8,11) 


(9,12) 


(10,13) 


(11,14) 


(12,15) 


(13,0) 


(14,1) 


(15,2) 


-4- 


(8,12) 


(9,13) 


(10,14) 


(11,15) 


(12,0) 


(13,1) 


(14,2) 


(15,3) 


-5- 


(8,13) 


(9,14) 


(10,15) 


(11,0) 


(12,1) 


(13,2) 


(14,3) 


(15,4) 


-6- 


(8,14) 


(9,15) 


(10,0) 


(11,1) 


(12,2) 


(13,3) 


(14,4) 


(15,5) 


-7- 


(8,15) 


(9,0) 


(10,1) 


(11,2) 


(12,3) 


(13,4) 


(14,5) 


(15,6) 


-8- 


No-Op 


No-Op 


No-Op 


No-Op 


No-Op 


No-Op 


No-Op 


No-Op 



In the alternative, one may reorganize the terms to be calculated for computation 
on a four-node cluster as indicated below in Tables Bl and B2. Note, however, that the 
5 partitioning presented is not the only way to partition the calculation. 
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Table Bl 



Sltaae 1 


node 0 


node 1 


node 2 


node 3 


1 1 
-1- 


(0 11 


CI 21 


(2,3) 


(3,4) 


-z- 


(0 21 


(1 31 


(2,4) 


(3,5) 


-J" 


(0 31 


(1 41 


(2,5) 


(3,6) 


-H- 


(0 41 


(1 51 


(2,6) 


(3,7) 




(0 51 


(1 61 


(2,7) 


(3,8) 


-o- 


(0 61 


(1 7) 


(2,8) 


(3,9) 


7 
- /- 


(0 71 


(1 81 


(2,9) 


(3,10) 


c 

-O" 




(1 91 


(2,10) 


(3,11) 


Q 

-y- 


(R 91 


(9 101 


(10,11) 


(11,12) 


1 n 

- 1 U- 


(R 101 


f9 111 


(10,12) 


(11,13) 


1 1 
-11- 


1 11 


(9 121 


(10,13) 


(11,14) 


-1Z- 


f£ 171 


(9 131 


(10 14) 


(11,15) 


-13- 


(8,13) 


(9,14) 


(10,15) 


(11,0) 


-14- 


(8,14) 


(9,15) 


(10,0) 


(11,1) 


-15- 


(8,15) 


(9,0) 


(10,1) 


(11,2) 


-16- 


No-Op 


No-Op 


No-Op 


No-Op 



5 
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Table B2 



St3.gC 


node 0 


node 1 


node 2 


node 3 


1 7_ if 




(5,6) 


(6,7) 


(7,8) 


-18- 


(4 6) 


(5,7) 


(6,8) 


(7,9) 




(4 7) 


(5,8) 


(6,9) 


(7,10) 


-90- 


(4 8) 


(5,9) 


(6,10) 


(7,H) 


-91 - 


(4 9) 


(5,10) 


(6,11) 


(7,12) 




(4 10) 


(5,11) 


(6,12) 


(7,13) 




(4 in 


(5 12) 


(6,13) 


(7,14) 


-94- 


(4 12) 


(5,13) 


(6,14) 


(7,15) 




(12 13) 


(13,14) 


(14,15) 


(15,0) 




(12 14) 


(13 15) 


(14,0) 


(15,1) 


97- 


(12 15) 


(13 0) 


(14,1) 


(15,2) 


9£- 


(12 0) 


(13 1) 


(14,2) 


(15,3) 


-29- 


(12,1) 


(13,2) 


(14,3) 


(15,4) 


-30- 


(12,2) 


(13,3) 


(14,4) 


(15,5) 


-31- 


(12,3) 


(13,4) 


(14,5) 


(15,6) 


-32- 


No-Op 


No-Op 


No-Op 


No-Op 



FIG. 6 depicts a conventional node communication pattern. Within each box at a 

given stage the term being computed is indicated. The number at the tail of each arrow 

indicates the block being passed at that stage. FIG. 6 also indicates some general 

properties that are useful in characterizing the general case. In order to precisely describe 

the processing, one may make the following definitions. Let k denote the number of 

nodes in the computing cluster and Nbs the number of stripes of R such that 2k divides 

34 



Attorney Docket No. 7744-0061 



N. Defining the permutation matrix for the shift operator as 



1 : if j>i j-i = l O^iJ <k-l 
I :if j<i i-j = k-\ 0<i,j<k-l 
0 : otherwise 



the n th power of is given by the following 



«W 4 



1 :ify = (nMod£+0Mod£ §<i,j<k-\ 
0 : otherwise 



In general, blocks being received by the nodes at stage n are given by (&kT 
multiplied by a vector which characterizes the blocks sent at stage n. The block number 
that node /' sends at stage n for a cluster size of k is denoted by SendJBIock(«; i; k) and is 
given functionally by 



u 



(«-/-!) Mod 



Send__BIock(«; i; k) = < 



-J+r5ij-*+ij 



Mod ^ : if 0 * « Mod^> i 



: if0^nMod^<? 



0 



:Q = n Mod 



10 The received bands are characterized by 



Received_J5tock(n; z; k) = Send_Block(«; /; £) 



Note that the functional form of (0^) n gives the communication required at stage n for 

each node of a computing cluster with k nodes* As such, the function 
Received_from(ra; i; k) is defined by 
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ReceivedJrom(w; i; k) = (n Mod^ + i) Mod^ 
and reducing Received_BIock(w; z; k) gives 

Received_Block(n; z; k) = Send_Block(«; Received_from(w; z; k); k) 

2((»-l)Mocy N n 
Current_Block(«; z; k) = L J ^ + L J •£ + i 

Send_to(n; i\ k) = (k~n Mod^ + z) Mod^ 

Putting partial calculations together, it is now possible to express functionally the 
calculation at the i th node at the n th stage. In what follows, k and N are suppressed. Let 



rb.(n) = Received_Block(w; i; k) 



cb.(n) = Current_Block(w; z; k) 



Current_Block__Matrix(n : i) = (cb.(ri), rb.(n)) 



^R(cbfn))jt(rb.(n))j : \f c bfn) < rbfn) 

| Rirbfn^jiicb . if c £ .(„) > r fi 
c 0 : otherwise 

Accordingly, the process of distributing model creation is easily implemented 
consistent with the present invention. 



VILC A Distributed Computing for Recommendation Processing 

It is also possible not to add all the pieces together and make recommendation 
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directly from the nodes. That is, it is possible to have node i keep the portion of the 
runtime model for the items in R (i) . In a preferred embodiment, each node must 
communicate with the other nodes to compute the proper portion of the model. This is 
given by 



M 



2 n -i 



node . , 



The run- time distributed model (step 615) is given by 
r® 

M , = Unary? (M , ) 

node. /c v node: 

i i 

Furthermore, let S be a set of items from which a recommendation will be 
processed. If one partitions this set of items according to with nodes representing the 

items of S, then S = \J^ < , n S., where Si is the restriction of S to the i tk node. Since 

the union is disjoint, the voting algorithm may be distributed according to the formula 



f 



Unary_Multi_Vote£,OS ? k) = TOP^, 



z I' 



' r(k) ^ ^ 



M 



node. 



i * 



v - <c ^- J 

Accordingly, the results from a combination of nodes may be efficiently 
processed consistent with the present invention. 

VII.D. Business-to-Business Model Sharing 

Next, one can consider either different divisions of a company or different 
companies that identify a common set of clients from which to build cross-company or 
cross-division co-rate models. This is straightforward given the method developed for 



37 



parallel dimension and distributed computing above, 
on a set of clients. 



Attorney Docket No. 7744-0061 
The only issue is one of agreement 



VILE. Mobilized Distributed Personalization 

Having described the methods for calculating models in their entirety — by 
computing pieces of models and computing derived models — the next task focuses on 
distributing the model to localities at which recommendation processing will occur. 
Models may be distributed in their entirety if memory constraints permit. More to the 
point, by having pre-processed the models it is now feasible to distribute only portions of 
the model that are usable by a given set of personalization identifiers (i.e., a single client's 
ratings). It is shown below that a personalized model maybe produced, thereby further 
personalizing the client's experience. In exactly the same fashion that a model may be 
partitioned for model creation and the recommendation process, a client's ratings may be 
partitioned according to categories. 

Suppose that there are q categories. One may denote the list of categories as 



C = (Cp . . . ,C^). Each C. is a list of items consisting of the members of the / 



category. Since C. is a set of items, it may be considered a sparse vector with values of 



ones. As such i s the diagonal matrix with ones on the diagonal corresponding to 
the list C.. Let (step 610) 
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10 



The model for co-rating all items to category C. is given by (step 615) 



M(Cj) r ( k ) = Unary^ (RtR C j) 

where j runs over the list of categories C. Suppose that one wants to specialize these 
models to a client's rating. This may be useful because one is going to distribute the 



model. Consider starting with D(R , the diagonal matrix corresponding to client w's 
ratings. In a preferred embodiment, it will be customary to personalize the models 
M*ifi) (Cj) to a client's ratings given by the non-zero rows of (step 615) 

J u 9 j 
For example, suppose "restaurants" is a category, then a model for recommending 



restaurants from client u is given bvM ' v ' (C .). Since the personalized model 

& J u v resturants 7 r 

is produced by multiplying by a sparse diagonal matrix, the number of non-zero rows in 

question should be significantly smaller than that of the respective model. As a result, the 

collection of personalized models across all categories may be a reasonably sized set. 

One may define the block matrix representing these derived models by 



M«'"K*)(C,) 



15 jtf"/K*)(Q = 



One may also consider the matrix that characterizes a client's rating across the list of 



39 



Attorney Docket No. 7744-0061 




R 




1 





Although R U (Q and M^ r (£)(C) are somewhat complicated to express formally, in 

practice the amount of represented data is relatively small as compared to the entire 
model These sets correspond to the personalized portions of the model with respect to 
client u and the category partitioning induced by C Making y-recommendations from a 
unary multiplicity voting algorithm using client u's z-ratings may be given by 



Of course, there are other meaningful expressions for making such 
recommendations if the co-rate values are taken into consideration. Furthermore, it is 
also consistent with the present invention to employ a client's entire set of ratings to make 
recommendations into a category, and if one of the categories is the entire item set then 
this will, in fact, be the case. This formulation enables recommendations to be made from 
a client's ratings in either a specific category or across categories. In all cases, if the 
client's personalization identities are known prior to the request for recommendation, the 
respective parts of the personalized model may be calculated and distributed to the 
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location where the recommendations will be processed. This might be a desktop unit, 
cellular phone, personal digital assistant, digital assistant in a car, or some other device. 



VILE. 1 . A scenario for personalization using personal digital assistants 
5 Further still, as an example, consider the following scenario. A person on a 

business trip carries a personal digital assistant (PDA). The PDA may or may not have 
Internet connectivity. If Internet connectivity is not present, then it is assumed that the 
person has some other means of Internet connectivity. Suppose that a database exists that 
contains this person's ratings for restaurants, theater, recreation, clothing, shopping, etc. It 

10 would be useful to make recommendations for any of these categories from this person's 
ratings of such categories. As an example, it makes no sense to recommend a restaurant 
in a different city. On the other hand, it is of utility to load derived models, such as those 
described in this section, directly into the clients PDA. This may be achieved by the 
PDA's direct Internet connectivity or by some other means. In either case, the models 

15 needed to make personalized recommendations for this client may be restricted to a 

memory allocation size to allow the entire footprint to fit into the PDA. In such a case, 
the algorithms of this and other sections are of a sufficiently simple nature that 
computation on the PDA is possible. If the PDA does in fact have connectivity, then an 
update of its internal models is possible on an ongoing basis. Otherwise, it could be 

20 updated out-of-band. Although out-of-band updating is less responsive than that of 
in-band updating, the rate at which added ratings will change the outcome of the 
recommendations is a minor effect in many scenarios. In this manner, a PDA could have 
the required runtime capability to make recommendations personalized to a client's 
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ratings and derived personalized models. Thus, recommending breakfast in a particular 
city may be personalized depending on what one had for dinner the night before. 



VIII. SECOND EMBODIMENT OF THE PRESENT INVENTION 
5 In a second embodiment of the present invention, depicted in FIG. 7, a method for 

providing a recommendation to a client is based upon a model that is incrementally 

updated. Within FIG. 3, this corresponds to the process indicated by update box 140, and 

in FIG. 7 this corresponds to step 720. 

A fresh model for on-line (or runtime) processing is required in order to make 
10 accurate recommendations. Whether the model is computed from scratch or the model is 

incrementally updated, the operations performed reduce to efficiently performing sparse 

matrix operations. 



VIII. A. Incrementally Updating Model and Model Creation 

1 5 Updating co-rates is a means by which a recommendation model is incrementally 

updated in a fashion analogous to the fundamental theorem of integral calculus. 
Specifically, and as discussed earlier, the model for making recommendations comes 
from the matrix of co-rated items (^M = R ( R, For now, consider that the number of 
items is fixed, resulting in (^M being of fixed dimension. As time evolves, the matrix 

20 {I .j)M changes and it is natural to consider the matrix to be a function of time, denoted 

(i-i)M(f). Imagine that one could instantaneously compute the matrix of co-rated items. 

Clearly, this would be ideal. As new ratings arrive, the co-rate matrix jumps at the time of 

updating, first taking its initial value and then jumping to its next value, and so on. Note 

that the fixed number of items results in a model of bounded dimension. This is not 
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general enough for the purposes of the present invention and as such the underlying state 
space for R should be extended to be infinite by infinite matrices indexed by the non- 
negative integers by the non-negative integers, with the property that only a finite number 
of clients and items have non-zero entries. For example, R ui is really a doubly indexed 
array, starting at zero for each of the independent coordinates. All that one is now 
allowing is that the arrays in question are finite and not of fixed dimension. The extension 
requires that our methods for update work independently of the number of clients and 
items. Since the multiplication of two matrices from our extended state space yields a 
matrix in the state space, the methods will function smoothly. 

Note that in a time interval [0, T\ there are only a finite number of jumps and 
therefore the jump sequence may be enumerated as to = 0 < t } < t 2 < t 3 < . . . < t n < T. At 
any time in [0, T ), the model's change is given by 



(I-I) M{t \ 



(M) M(t) - (M) M(r)=Rt(t)R(t)~R(r)R(r) 

where f indicates the use of a left limit. In other words, if there is a jump at time t, then 
R{f ) is the ratings matrix the moment just before the update occurred. If t is not in the 
jump sequence, then 5[(i.j)M(t)] equals the matrix of all zeros. Since the jump times are 

discrete, then, for two consecutive jump times one, has the relationship R(tj) = R(t. + ^). 

This is nothing more than a restatement that R is constant between updates. Let time T> 
0 be chosen and as above enumerate the times at which jumps occur. We can express the 
matrix of co-rates as a telescoping sum of incremental updates, given by, in general, 
(considering both (M)M(t) and ( c . c) M(t)) 

M{T) =M(t n ) = (M(0) -M(0-)) + (M(fl) -M(t\)) + ... + (M(g - M(y) 
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As a reasonable convention, one may choose M(0") to be the matrix of zeros. 
Letting to = 0, M(T) may be expressed in the following summation (step 720 of FIG. 7): 



M(7) = 2_j m ^ 



^ 5M( tj ) 



= M(0) + £^ m (Q 

This expression is a discrete analogue to the Fundamental Theorem of Integral 
Calculus. Accordingly, one may return to 5[^_/jM(Y)] and compute a product rule for the 
increment: 



= R*(t)R(t)-Bt(r)R(r) 

= **(*)*(*) - Rt(t)Rt(r) + Rt(t)Rt(t-) - R*(r)R(r) 
= R<(t)W) - R*(r)) + (R'(t) - R*(t-))R(r) 

= R*(t)5R(t) + 5R\t)R(r) 

=R t (t)dR(t) + [bR(t)YR(r) 

= [R(T) + 8R(t)]*8R(t) + [8R(t)]tR(f) 

= [R{r)YdR(t) + [dR(t)YSR(t) + [8Jt(*)]'Jl(r) 

= [R(r)]*8R(t) + [SR(f)]tR(r) + [Stf (0]'8I?(0 

= [R(r)]t5R(f) + [[R(r))*bR(f)]t + [dR(t)]t6R(i) 

44 



Attorney Docket No. 7744-0061 

= J?'(r)S*(0 + [#(r)8*(0]* + $R(t)]*&R(t) 

Note that 6[(/./)M(t)] is a symmetric matrix and only involves the values of R(r) (the 
sparse matrix of step 705) and 8R(t) (the update ratings matrix of step 710). Although it 
appears that all of R(r) is being used, this is not necessary. We only need to extract the 
5 client rows that can contribute to 6R (t~). 

Letting 5/ denote the set of items that have changed ratings, one obtains 
(step 720) 

*1 ,s/ r)6 * w + [4 6 /O§*(0]' + [S*(0]<8*(0 

10 VIII.B. Directly Calculating the Runtime Model 

In practice, the matrices R f R and RR t may become extremely large. It is 
preferable, therefore, to calculate the runtime models iteratively in a manner that allows 
one to truncate the model at intermediate steps of the calculation. It is also important to 
note that the approach described here works for both item co-rate models and client 

15 co-rate models. This gives the ability to efficiently calculate the top k client neighbors 

exactly. 

VIILB. 1 . Item-Based Models 

One can consider a partition of all the items for which there exists a client rating. 

20 One may denote this partition as B = (B. , B~, . . . , Bj) and define (tep 715) 
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UJ, 



R .AfieB. 

u,i j 

0 : otherwise 



10 



In a manner similar to that used in the parallel dimension model analysis, one may 
consider striping the rating matrix such that (step 715) 

where rH) is defined above. The model is given by the following calculation (step 720) 
Collecting terms and reordering the summations yields 



R*R = ( I ZJI&tyKBbJ) 
i = l/=l 



z=l 



j:i<j<k 



k 



(J?(0)ta(0 + L< 

j:i<j<k 



where the cross-dimensional terms are defined by 



i?(0^(/')| = (RtytRQ) + (Rti)yR(i)) 

The runtime model (step 720) may be produced in a variety of ways. One way is 
to apply the unary operator Unary^ as follows 



k 



15 Unary, (& R) = £ 



i = l 



Unary^ ((ff (0/ Jl(0) + £ Unary^ y j J?(0,1?(/) j 

,/.z</<A: 
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This yields an expression that enables iterative computation of an item-based model. 

VIILB.2. Client-Based Models 

In many cases from real data, the matrix of co-rate clients is larger than the matrix 
5 of co-rated items and contains more non-zero entries. This results in not being able to 

store the entire model for co-rated clients, because of memory and storage constraints. In 
some cases, the same issue applies to the matrix of co-rated items. Hence it is important 
to be capable of computing the runtime versions of these models. The result of the 
previous section has a direct consequence to client-based models by replacing R* for R 
10 and noting that (2?')' = R. Doing so results in the following expression for the client 

co-rate matrix (step 720) 




The above expression and that of the previous section indicate that in spite of 
memory constraints, the runtime models for co-rates may be iteratively computed. 
15 Although doing so makes updates harder, it does not prevent them. In practice, it may be 

more efficient to re-compute in certain instances rather than computing the incremental 
update. In either case, the methods employed yield the ability to produce the runtime 
model directly from the calculation. 

20 IX. THIRD AND FOURTH EMBODIMENTS OF THE PRESENT INVENTION 

In a third and fourth embodiment of the present invention, depicted schematically 
in FIG. 8, a method for providing a recommendation to a client is based upon the ability 
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to perturb a first model so as to generate a second model, or, alternatively based upon the 
truncation of a first model so as to generate a second model. This corresponds to 
perturbed model oval 35 as depicted in FIG. 3, or step 820 in FIG. 8. One skilled in the 
art should appreciate, however, that such a perturbing process may be implemented in a 
variety of regions in FIG. 3 consistent with the present invention. Accordingly, the 
methods for perturbing the models include truncation of the model for use in on-line or 
runtime processing, perturbations that favor a set of items, and functional scaling. All of 
the methods for perturbing the basic models are derived from either the mathematical 
structures being used or their internal representation as compressed row matrices. 

IX.A. Example of a skewed recommendations 

Suppose that one wants to calculate the perturbed recommendation system and an 
associated marketing campaign. Suppose that widget X is assumed to be in a category Y 
of widgets. One may wish to perturb the situation so that one is recommending widget X 

1 5 for all the widgets in this category that have profit margins less than that of X's. Let 

denote the set of widgets that X will replace. Assume that X replaces itself. In order to 
skew the recommendations, one may construct the following matrix: 



10 



C .= 



1 :tfi=jtS x 

1 :if/ = X ieS x 
0 : otherwise 



20 



Now one needs to compute the perturbed model for on-line recommendations. 
Two possible matrix products yield skewed models (step 820): 
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5 



or 

W* = Unary k (M*C) 

The marketing campaign for widget X can work directly from M considering 

i?-(Unary(M Xj& ))^ 

as a ranking of clients who bought X's neighbors . 

Consider further that one has a set of clients £ and one wants to construct a 
recommendation of the top-A; items for this set of clients. One possible recommendation is 
given by (step 835) 

10 Unary JVlulti_Vote ^ (Unary^( J 9 k'jteighbors) where u e S 

u e U 

where the parameter q indicates that one is only using the top q items purchased by 
clients from 5*. 

Further still, consider that one has a set of items S and one wants to construct a 
recommendation of the top-A: clients for this set of items. This is a batch process 
15 situation. There are two situations to consider. First, a batch process that returns a 

ranking of clients for each item in the set S and second, a recommendation of clients for S 
entirely. 

The first situation is a generalization of the situation presented in the simple 
scenario, given by 



20 R • (Unary(M c J t 
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and the second is 
if-(Unary( I'M^))' 

One may introduce a q parameter by replacing the Unary operator with the 
Unary^ operator. Furthermore, either the return vector needs to be sorted or one could 
only gather clients whose rank in the relative scaling is above a given threshold. 

IX.B. Functional scalings 

The Unary* function is preferably used for the construction of the on-line models. 
Recall that, after this operator is applied, the non-zero entries of the resulting vector are 
equally weighted for use in constructing recommendations. For some data sets, this is 
fine, but for others, a relative scaling at the individual item neighborhood is relevant to 
the formation of the final recommendation. In some cases, one may find that instead of 
using Unary*, the use of Index* is more suitable. It has also been useful to scale the 
individual entries by a weight corresponding to the diagonal terms from if 1 if. (One 
skilled in the art should appreciate that the diagonal terms of if 1 R are the number of 
times that an item has been rated.) 

Now it is possible to describe a second level of abstraction to making 

recommendations. Recall that a basic recommendation model used Mifi) = Unary^ (M). 

If one lets a e (0,1], then we can scale the entries of Mby the diagonal terms to the a 
root and then use the Unary* operator to construct M r(k K The functional form of this 
statement is given by (step 820) 
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Mi k ) = Unary, (M • ([(D 2 M) -1 ] 01 )) 



and the basic form of the recommendation remains unchanged as 

ieS 



Unaiy JVlulti^ote^S, *) = TOP^, 



Next, one may let / be a function defined on the positive real numbers with values 
in the positive real numbers. Furthermore, one may define the multiple-dimension 
function given byF =(/,/,...,/). That is / acts on each coordinate where the 
dimension is determined by the context. In a preferred embodiment of the present 
invention, the functional scaling is of the form (step 820) 

W(® = Unary^ (M • (F((Z>2 M)" 1 ))) 

More generally, let G = (g. .) and let the recommendations be given by (step 820) 

W( k ) = Unary^ (M • G(M)) 

Although this is not the most general functional form, this represents a preferred 
scaling. 



X. FIFTH EMBODIMENT OF THE PRESENT INVENTION 

In a fifth embodiment of the present invention, depicted in FIG. 9, a method for 

providing a recommendation to a client is based upon the construction of a model using 

cross-set co-occurrences (step 915). 

For example, and considering the R-T-R oval 160 in FIG. 3, it may be that the left 

hand matrix represents a first category of data, and the right handed matrix represents a 
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second category of data. In such a case, the model would then not be R 1 R, a symmetric 
matrix, but rather A t B, which in general, is not a symmetric matrix. Such a cross-set 
model allows for an entirely new basis for recommendations. The first and second 
matrices for the cross-set co-occurrences, as stated earlier, may be created at any stage of 
5 processing, such as sub-space matrices corresponding to particular categories or 
dimensions. 

XL SIXTH EMBODIMENT OF THE PRESENT INVENTION 

Further still, in a sixth embodiment of the present invention, a method for 
10 providing a recommendation to a client is based upon the identification of a subset of 
items through a multiplicity voting scheme, which may be personalized or may be 
anonymous. This corresponds to, respectively, personal recommendation box 3 10 of 
FIG. 3 or anonymous recommendation box 320. 

15 XI. A. Making Anonymous Recommendations 

One of the purposes of the calculations is to construct a recommendation model 
with a desired property. A suitable set of models is constructed in a sequence of off-line 
processing stages. This separation of processing minimizes the runtime evaluation as 

described by the functions Unary _MuIti_Vote^ r ( *, k) and Multi_Vote^, ( *, k) 

20 Of course, any additional processing of utility may be further applied. However, 

one focus of the present invention is to minimize the need for such runtime processing by 
suitably constructing the models such that minimal runtime processing is required. Note 
that the evaluation of Unary JNlultiJVote or MultiVote involves adding of vectors 
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from a matrix that represents the on-line model. 

XI. B. Making Personalized Recommendations 

The making of a personalized recommendation is an application of anonymous 
recommendation, in which the system first constructs a list of items that are personalized 
and uses this list in the anonymous recommendation strategies. 

XII. GENERALIZED ASPECTS OF THE PRESENT INVENTION 

The remainder of the discussion herein will focus primarily on generalizations 
surrounding aspects of the invention as described in the preferred embodiment. In turn, 
these generalizations encompass the topics of filtering, high-order state spaces, and data 
properties. The topic of sparse matrix calculations is also briefly discussed. 

XII.A. Generalized Filtering: Item Injection/Rejection 

The issue of filtering may be done in many ways — one example is given in the 
usage scenario. In that case, a matrix was produced that had an effect on the manner in 
which the recommendations would be created. This type of an effect can be described as 
a scaling factor. This will be described formally in the next section. 

XII.A. 1 . Scaling Factors 

As an example, the type of scaling factor that was used in the simple usage 
scenario is described here. Recall that there was a rule to replace widget X for all widgets 
in *$x. This rule induces a {0, 1 } -valued function of the set {( i, j)\ ij e 1} . That is, a value 
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of one if the rule holds between i and j 9 and a value of zero otherwise. Such functions 
directly give rise to matrices that may be used as scaling factors. 



In fact, a matrix that encodes a usual scaling of items to items may be used as a 
scaling factor 



As a direct consequence of producing a model, there are multiple places to 
perform item rejection or item injection. As described, recommendations can be made 
from a class of models. Entries in these models may be removed, rejected, or have their 
co-rate values modified in order to meet some rule-based policy to be enforced. As such, 
the need to perform such roles becomes less stringent at the runtime evaluation of the 
recommendations. Doing such modifications at the model level can be performed in 
either a reversible or irreversible manner. This is the choice of recommendation policy 
that the recommendation engine will enforce. Note that any such preprocessing of data 
does not rule out the possibility of runtime evaluation of rules that enforce the relevant 
portions of a runtime recommendation policy. 

XILB. Generalized High Order State Spaces 

All of the previous examples discussed here dealt with, at most, integer-valued 
matrices. However, this does not have to be the case. There are three areas in which a 
more general model will be of immediate utility and are consistent with the present 



W. . 




XII.A.2. 



Item Rejection/Injection 
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invention. These areas are the following: (i) "Not For Me:" a scenario in which clients of 
a web site indicate that an item should not be recommended to them, and amounts to 
incorporating negative feedback into the model based prediction algorithms; (ii) 
"Temporal Data:" where, in the descriptions of calculating the model, there has been no 
5 use of time; and (iii) "Windowing:" a scenario in which the model reflects data collected 
only after a certain data and one wants to maintain the model's accuracy as a running 
model. 

All three of these scenarios have a common structure that is described below. 
One important aspect of this is the idea of pointing to a value of a entry of a matrix. That 
10 is, one can separate the reference of the ( i,f)-eatry of a matrix from the value. As a 

result, the value need not be integer-valued. Thus, this approach can be easily extended to 
support vector-valued matrices or some other useful structure. Described below are 
extensions in which the use of vector- valued matrices is both natural to consider and of 
utility. 

15 

XII.B.l. "Not For Me" 

The notion of "not for me" is basically a three-state model for the values of a 
client's preference between any two items. In the initial formulation there were two states, 
rated both and no information. The reason the second state is referred to as no 
20 information is if a • b = 0, then { a - 0, b = 1}, { a = 1, b = 0}, and { a = 0, b = 0} are 
indistinguishable states of a and b. If it is desirable to distinguish between these states 
(gaining information) then one must extend the notion of a • b to take values in a larger 
state space. In the case of introducing "not for me" this is exactly what one wants to do. 
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3 

One may consider a state space of zero-one-valued triples, { 0, 1} , whose entries 
some to zero or one. There are only possibilities, Q = {(0, 0, 0); (1, 0, 0); (0, 1, 0); (0, 0, 
1)}. We will allow our ratings to take values in ^admissible = {(0, 0, 0); (0, 0 5 1); (0, 1,0)}. 
The value (1, 0, 0) will be reserved to indicate disagreement. A dictionary for these 
5 ratings values may be: 

(0,0 ? 0) indicates no information for the client. Of course, using sparse matrices, we do not 
store these values; 

(0,1,0) indicates that the client rated the item favorably; and 
(0,0,1) indicates that the client rated the item "not for me." 
10 The overload multiplication as defined by Table C: 

Table C 



(0, 0, 0) 


(0, 0,0) 
(0, 0,0) 


(0, 1,0) 
(0,0,0) 


(0,0, 1) 
(0,0,0) 


(1,0, 0) 


(0,1,0) 


(0,0,0) 


(0, 1,0) 


(1,0,0) 




(0, 0, 1) 


(0, 0,0) 


(1,0,0) 


(0,0, 1) 




(1,0,0) 











where the "-" indicates the non-admissibility of the ratings being "multiplied." 
Summation of any two elements from Q is performed coordinate wise. The following 
15 could equally have been defined over complex-valued matrices, but in practice the above 
description would be more practical. If R is a ratings matrix taking values in ^admissible, 
the set of admissible states then 
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and defines our state matrix of co-rates. 

Next is the issue of making recommendations. Previously, the approach was to 
take the top co-rated entries of a row vector. If one wishes to use only one coordinate 
from the state vector this scheme will still work, but this new situation is far more 
extensible than that simple situation. A possible scenario for distinguishing between state 
vectors is given as follows. 

Let *F = ( iffy . . . , ) denote the state vector from R 1 * R, in which Q is the 

state space for ratings. Furthermore, let / be an non-negative integer-valued function on 
Q. For example, we may want to define cut-off thresholds. In order to properly define 
such examples, consider the Heavyside function defined on the real values. 



1 :ifs-t>0 
otherwise 



Now one can discuss functions such as 

^(0,1,0)^*- ^(0,1,0)> H( *'~ "(W 

This function evaluates to the favorable co-rate value, - m ) , as long as 

the value the "not for me" and "disagreement" value are not above the cut-off values of k, 
respectively. 

The above function is exemplary only, however, and in no way limits the present 
invention. 
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XII.B.2, Temporal Data 

Further still, one can consider the problem of making the state time dependent. 
Recall that in Section VIII. A the incremental update for any change in the ratings was 
5 computed. 

Suppose that S is defined to be real-valued pairs. The first entry stores the 
increment, and the second entry stores the timestamp indicating when the update 
occurred. Now rather than have a matrix that only indicates the co-rate value in its 
entries, one has a co-rate matrix whose entries are values in sequences in ST whose 
10 coordinates are all zero for sufficiently large indices, and which will encode that there 
have been only a finite number of updates in any time interval. The first entry will store 
the current value of the co-rate and the remaining entries of the sequence keep track of 
the incremental updates. In this manner the temporal nature of the co-rating matrix has 
been maintained. 

15 

Xn.B.3. Windowing 

Further still, in practice, the description of sequencing the incremental updates 
will become storage intensive and uninteresting. As a result of maintaining timestamps, it 
is possible to incrementally remove co-rate contributions that are too old. This may be 
20 performed at the time of an incremental update by checking whether there are any 

previous updates that are out of date. This is useful for item identifiers in the model that 
have been reused to represent similar items. One does not want to remove all the rating 
for the item, because then the start up problem exists. In this phasing of valid co-rates, we 
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eventually have co-rates that represent the proper item in the real world (as opposed to 
this matrix representation of the world). 

XILC. Generalized Data properties 

Throughout the discussion presented here, unary data, as defined herein, has been 
the predominate data form. However, and in relation to the step of preprocessing data in 
a preferred embodiment, "Interest Data" and "Likert Data" are discussed briefly. 

XII.C.l. Interest Data 

The fundamental fact that makes R t R and RR 1 calculations possible is that if 
a, b, e {0,1} then ab = 1 <=> a = 1 and b = 1. Note that ab = min(a, 6). If one considers 
the value "1" to mean interested in the item and the value "0" to mean no information, 
then the co-rate contribution of two entries makes sense as their minimum. Consider 
overloading the matrix multiplication operation (denoted with *): 



All of the methods described are immediately extended to situations in which an 
overloaded multiplication operation is used. Note that if the rating data is in fact unary, 
this operation reduces to the unary algorithms describe elsewhere in this document. This 
fact has a direct consequence for the parallel dimensions algorithms. Suppose that one 
has three dimensions, one dimension of which is unary and two dimensions of which are 
interest. Recall that the cross dimension models were defined by 




u 
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M{ij) r ( k ) = Unary k ((jKO/JJ©) 

The cross dimension models may now be extended to 

WijyW = Unary k ((*(0)* * R<D) 

Suppose that the j dimension is an interest dimension and i dimension is unary, 

then 

Unary k ((flO'ty * R@) = Unary k ((*(*'))' Unary(tf (z))) 

This formula shows that in cross-dimension models involving a unary dimension, 
the model will reduce to the cross dimension model where the interest dimension in 
question has had a unary operator applied to make the data unary. This results in being 
able to deal seamlessly with multiple dimensions, mixed between interest and unary 
ratings. 

XII. C.2. Likert-Binary Data 

The approach to dealing with Likert data has been to make unary ratings data 
from the Likert data by using a binary cut-off. That is, values above a predefined 
threshold become one and otherwise zero. This threshold may be done on a client-to- 
client basis. 

XIII. EXAMPLES OF SPARSE MATRIX CALCULATIONS 

One aspect of the present invention has been the focus on the mathematical 
formulation for calculating co-occurrence in ratings data. The formulation disclosed used 
standard mathematical techniques in order to derive formulas that have utility for making 
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fast, accurate recommendations. Since, most often, the data from which the 
recommender system will need to compute is extremely sparse in its matrix 
representation, it is further useful to turn attention towards some numerical analysis 
aspects. An example of a numerical analysis kit which may be used with the present 
5 invention is SPARSEKIT2 (available from the University of Minnesota, Department of 
Computer Science and Engineering, Minneapolis, MN, 
<ftp://ftp.cs.umn.edu/dept/sparse/>). 

XIII. A. Compressed Sparse Row Format (CSR) 

1 0 The matrices that have been discussed are such that the non-zero entries on a row 

represent a client's ratings. As such it is not surprising that one desires a representation 
for the computation that favors a viewpoint that is efficient for accessing the non-zero 
entries of a row. Such accesses are not the only operations that are required. Matrix 
multiplication is a fundamental operation and hence one will need this operation to be 

15 efficient. The compressed sparse row format, described below, is quite suitable for 

computation of the calculations discuss earlier. For a more thorough description of this 
and other formats the reader is directed to the documentation of the SPARSEKIT2 
package. 

The data structure used to represent a compressed sparse row formatted matrix 
20 consists of three arrays: (i) an array containing the non-zero entries of the matrix; (ii) an 
array containing the column positions of these non-zero entries; and (iii) an array 
containing pointers into the previous arrays corresponding to the beginning of each row 
of the matrix. 
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As an example, suppose that the matrix R in its standard representation is given 

by 

^0100 1^ 
10110 
* 00001 
VI 0001/ 

Consider that the following arrays are zero based. FIG. 10 shows three arrays that 
5 represent the matrix R in compressed-row format. Note that the last value of the array of 
row pointers is a reference to where the next row would begin if there were a next row. In 
essence, it encodes how many non-zero entries exist on the last row. 



XIV. CONCLUSION 

10 Methods and apparatus consistent with the present invention can be used to 

provide rapid, accurate, preference recommendations to a client. The foregoing 
description of an implementation of the invention has been presented for purposes of 
illustration and description. It is not exhaustive and does not limit the invention to the 
precise form disclosed. Modifications and variations are possible in light of the above 

1 5 teachings or may be acquired from practicing the invention. For example, although the 
runtime recommendation system and off-line recommendation system were depicted as 
separated by a network (wireless or otherwise), such a depiction was exemplary only. 
One skilled in the art should appreciate that runtime recommendation system and off-line 
recommendation system may form different processing and memory portions of the same 

20 data processing device. Accordingly, the invention is not limited to the above described 
embodiments, but instead is defined by the appended claims in light of their full scope of 
equivalents. 
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